Keyword Extraction using Multiple Novel Features

نویسندگان

  • Shaohui YANG
  • Bo ZHANG
  • Shun LI
  • Chaohui YU
  • Qianting HAO
چکیده

In this paper, we propose a novel approach for keyword extraction. Different from previous keyword extraction methods, which identify keywords based on the document alone, this approach introduces Wikipedia knowledge and document genre to extract keywords from the document. Keyword extraction is accomplished by a classification model utilizing not only traditional word based features but also features based on Wikipedia knowledge and document genre. In our experiment, this novel keyword extraction approach outperforms previous models for keyword extraction in terms of precision-recall metric and breaks through the plateau previously reached in the field.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Extraction and Headline Generation Using Novel Word Features

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the...

متن کامل

Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction

Keyword extraction attracts much attention for its significant role in various natural language processing tasks. While some existing methods for keyword extraction have considered using single type of semantic relatedness between words or inherent attributes of words, almost all of them ignore two important issues: 1) how to fuse multiple types of semantic relations between words into a unifor...

متن کامل

Automatic Keyword Extraction from Documents Using Conditional Random Fields

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed t...

متن کامل

A Knowledge-Base Oriented Approach for Automatic Keyword Extraction

Automatic keyword extraction is an important subfield of information extraction process. It is a difficult task, where numerous different techniques and resources have been proposed. In this paper, we propose a generic approach to extract keyword from documents using encyclopedic knowledge. Our two-step approach first relies on a classification step for identifying candidate keywords followed b...

متن کامل

A Fuzzy Logic Based Improved Keyword Extraction From Meeting Transcripts

Keyword Extraction is the process of assigning keywords to a document where the important words are selected by the system automatically. This proposed frame work is used to extract the keywords using Fuzzy logic method from Meeting Transcripts. At first, the given input is preprocessed. Subsequently, the preprocessed data will be sent to the features extraction method. In this method three fea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014